4 research outputs found
Attention-Privileged Reinforcement Learning
Image-based Reinforcement Learning is known to suffer from poor sample
efficiency and generalisation to unseen visuals such as distractors
(task-independent aspects of the observation space). Visual domain
randomisation encourages transfer by training over visual factors of variation
that may be encountered in the target domain. This increases learning
complexity, can negatively impact learning rate and performance, and requires
knowledge of potential variations during deployment. In this paper, we
introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a
self-supervised attention mechanism to significantly alleviate these drawbacks:
by focusing on task-relevant aspects of the observations, attention provides
robustness to distractors as well as significantly increased learning
efficiency. APRiL trains two attention-augmented actor-critic agents: one
purely based on image observations, available across training and transfer
domains; and one with access to privileged information (such as environment
states) available only during training. Experience is shared between both
agents and their attention mechanisms are aligned. The image-based policy can
then be deployed without access to privileged information. We experimentally
demonstrate accelerated and more robust learning on a diverse set of domains,
leading to improved final performance for environments both within and outside
the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202
TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the
decomposition of complex, real-world tasks into simpler sub-tasks. By reusing
the corresponding sub-policies within and between tasks, they provide training
data for each policy from different high-level tasks and compose them to
perform novel ones. Existing approaches to modular LfD focus either on learning
a single high-level task or depend on domain knowledge and temporal
segmentation. In contrast, we propose a weakly supervised, domain-agnostic
approach based on task sketches, which include only the sequence of sub-tasks
performed in each demonstration. Our approach simultaneously aligns the
sketches with the observed demonstrations and learns the required sub-policies.
This improves generalisation in comparison to separate optimisation procedures.
We evaluate the approach on multiple domains, including a simulated 3D robot
arm control task using purely image-based observations. The results show that
our approach performs commensurately with fully supervised approaches, while
requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201
Discovering knowledge abstractions for sample efficient embodied transfer learning
This thesis concerns sample-efficient embodied machine learning. Machine learning success in sequential decision problems has been limited to domains with a narrow range of goals, requiring orders more experience than humans. Additionally, they lack the ability to generalise to new related goals. In contrast, humans are continual learners. Given their embodiment and computational constraints, humans are forced to reuse knowledge (compressed abstractions of repeated structures present across their lifetime) to tackle novel scenarios in as sample-efficient and safe manner as possible. In robotics, similar traits are desired, given they are also embodied learners. Taking inspiration from humans, the central claim of this thesis is that knowledge abstractions acquired from prior experience can be used to design domain-independent sample-efficient algorithms that improve generalisation across modular domains. We refer to modular domains as Markov decision processes (MDPs) whose optimal policies can be obtained when reasoning and acting occurs over compressed abstractions shared across them. The challenge is how to discover these abstractions with minimal supervision and sample-efficiently. Additionally, for embodied machine learning it is important the approach supports continuous, potentially unbounded, state-action spaces. Adhering to these constraints, we first develop novel self- (Chapter 3) and weakly-supervised (Chapter 4) knowledge abstraction, domain adaptation, methods for zero-shot generalisation to unseen domains. We demonstrate their potential on robotic applications including sim2real transfer (Chapter 3) and generalisation using a human-robot command interface (Chapter 4). We continue by developing novel unsupervised knowledge abstraction, transfer learning, methods for sample-efficient adaptation to unseen domains (Chapters 5 and 6). We highlight their relevance in robotics and continual learning. We introduce a hierarchical KL-regularised RL approach based on novel theory behind the transferability-expressivity trade-off of abstractions (Chapter 5) and develop the first, to our knowledge, bottleneck-options approach adhering to the aforementioned embodied machine learning constraints (Chapter 6)